{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "j4qMNV_533ls"
      },
      "source": [
        "##### Copyright 2024 Google LLC."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "urf-mQKk348O"
      },
      "outputs": [],
      "source": [
        "# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iS5fT77w4ZNj"
      },
      "source": [
        "# Gemma - Run with Ollama Python library\n",
        "\n",
        "Author: Sitam Meur\n",
        "\n",
        "*   GitHub: [github.com/sitamgithub-MSIT](https://github.com/sitamgithub-MSIT/)\n",
        "*   X: [@sitammeur](https://x.com/sitammeur)\n",
        "\n",
        "Description: This notebook demonstrates how you can run inference on a Gemma model using  [Ollama Python library](https://github.com/ollama/ollama-python). The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with Ollama.\n",
        "\n",
        "<table align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/Gemma/[Gemma_2]Using_with_Ollama_Python.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FF6vOV_74aqj"
      },
      "source": [
        "## Setup\n",
        "\n",
        "### Select the Colab runtime\n",
        "To complete this tutorial, you'll need to have a Colab runtime with sufficient resources to run the Gemma model. In this case, you can use a T4 GPU:\n",
        "\n",
        "1. In the upper-right of the Colab window, select **▾ (Additional connection options)**.\n",
        "2. Select **Change runtime type**.\n",
        "3. Under **Hardware accelerator**, select **T4 GPU**."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4tlnekw44gaq"
      },
      "source": [
        "## Installation"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AL7futP_4laS"
      },
      "source": [
        "Install Ollama through the offical installation script."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "MuV7cWtcAoSV"
      },
      "outputs": [],
      "source": [
        "!curl -fsSL https://ollama.com/install.sh | sh"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FpV183Rv6-1P"
      },
      "source": [
        "Install Ollama Python library through the official Python client for Ollama."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0Mrj29SH-3OD"
      },
      "outputs": [],
      "source": [
        "!pip install -q ollama"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iNxJFvGIe48_"
      },
      "source": [
        "## Start Ollama\n",
        "\n",
        "Start Ollama in background using nohup."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "5CX39xKVe9UN"
      },
      "outputs": [],
      "source": [
        "!nohup ollama serve > ollama.log &"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6YfDqlyo46Rp"
      },
      "source": [
        "## Prerequisites\n",
        "\n",
        "*   Ollama should be installed and running. (This was already completed in previous steps.)\n",
        "*   Pull the gemma2 model to use with the library: `ollama pull gemma2:2b`\n",
        "    *  See [Ollama.com](https://ollama.com/) for more information on the models available."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "oPU5dA1-B5Fn"
      },
      "outputs": [],
      "source": [
        "import ollama"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "NE1AWlucBza_"
      },
      "outputs": [],
      "source": [
        "ollama.pull('gemma2:2b')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KL5Kc6HaKjmF"
      },
      "source": [
        "## Inference\n",
        "\n",
        "Run inference using Ollama Python library."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HN0nrhpmFUUB"
      },
      "source": [
        "### Generate"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "AC5FsDQuFZfb"
      },
      "outputs": [],
      "source": [
        "import markdown\n",
        "from ollama import generate\n",
        "\n",
        "# Generate a response to a prompt\n",
        "response = generate(\"gemma2:2b\", \"Explain the process of photosynthesis.\")\n",
        "print(response[\"response\"])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "p8v050JkFhuY"
      },
      "source": [
        "#### Streaming Responses\n",
        "\n",
        "To enable response streaming, set `stream=True`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "jw8UD2v5Fms6"
      },
      "outputs": [],
      "source": [
        "# Stream the generated response\n",
        "response = generate('gemma2:2b', 'Explain the process of photosynthesis.', stream=True)\n",
        "\n",
        "for part in response:\n",
        "  print(part['response'], end='', flush=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "utyVRlIYFvdm"
      },
      "source": [
        "#### Async client\n",
        "\n",
        "To make asynchronous requests, use the `AsyncClient` class."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "JVvKQE0XF2kh"
      },
      "outputs": [],
      "source": [
        "import asyncio\n",
        "import nest_asyncio\n",
        "from ollama import AsyncClient\n",
        "\n",
        "nest_asyncio.apply()\n",
        "\n",
        "\n",
        "async def generate():\n",
        "    \"\"\"\n",
        "    Asynchronously generates a response to a given prompt using the AsyncClient.\n",
        "\n",
        "    This function creates an instance of AsyncClient and sends a request to generate\n",
        "    a response for the specified prompt. The response is then printed.\n",
        "    \"\"\"\n",
        "    # Create an instance of the AsyncClient\n",
        "    client = AsyncClient()\n",
        "\n",
        "    # Send a request to generate a response to the prompt\n",
        "    response = await client.generate(\n",
        "        \"gemma2:2b\", \"Explain the process of photosynthesis.\"\n",
        "    )\n",
        "    print(response[\"response\"])\n",
        "\n",
        "# Run the generate function\n",
        "asyncio.run(generate())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WFyi_zzWAwe7"
      },
      "source": [
        "### Chat"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "UN20MoSv_S76"
      },
      "outputs": [],
      "source": [
        "from ollama import chat\n",
        "\n",
        "# Start a conversation with the model\n",
        "messages = [\n",
        "    {\n",
        "        \"role\": \"user\",\n",
        "        \"content\": \"What is keras?\",\n",
        "    },\n",
        "]\n",
        "\n",
        "# Get the model's response to the message\n",
        "response = chat(\"gemma2:2b\", messages=messages)\n",
        "print(response[\"message\"][\"content\"])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HZGsL1FI7kj8"
      },
      "source": [
        "#### Streaming Responses\n",
        "\n",
        "To enable response streaming, set `stream=True`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "UdxAuS1C7lm_"
      },
      "outputs": [],
      "source": [
        "# Stream the chat response\n",
        "stream = chat(\n",
        "    model=\"gemma2:2b\",\n",
        "    messages=[{\"role\": \"user\", \"content\": \"What is keras?\"}],\n",
        "    stream=True,\n",
        ")\n",
        "\n",
        "for chunk in stream:\n",
        "    print(chunk[\"message\"][\"content\"], end=\"\", flush=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "czWceNYOEizg"
      },
      "source": [
        "#### Async client + Streaming\n",
        "\n",
        "To make asynchronous requests, use the `AsyncClient` class, and for streaming, use `stream=True`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "YJmd92z1IUVl"
      },
      "outputs": [],
      "source": [
        "import asyncio\n",
        "import nest_asyncio\n",
        "from ollama import AsyncClient\n",
        "\n",
        "nest_asyncio.apply()\n",
        "\n",
        "\n",
        "async def chat():\n",
        "    \"\"\"\n",
        "    Asynchronously sends a chat message to the specified model and prints the response.\n",
        "\n",
        "    This function sends a message with the role \"user\" and the content \"What is keras?\"\n",
        "    to the model \"gemma2:2b\" using the AsyncClient's chat method. The response is then streamed.\n",
        "    \"\"\"\n",
        "    # Define the message to send to the model\n",
        "    message = {\"role\": \"user\", \"content\": \"What is keras?\"}\n",
        "\n",
        "    # Send the message to the model and print the response\n",
        "    async for part in await AsyncClient().chat(\n",
        "        model=\"gemma2:2b\", messages=[message], stream=True\n",
        "    ):\n",
        "        print(part[\"message\"][\"content\"], end=\"\", flush=True)\n",
        "\n",
        "# Run the chat function\n",
        "asyncio.run(chat())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cDr8VzvGIXFC"
      },
      "source": [
        "## Conclusion\n",
        "\n",
        "Congratulations! You have successfully run inference on a Gemma model using the Ollama Python library. You can now integrate this into your Python projects."
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "name": "[Gemma_2]Using_with_Ollama_Python.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
